A guide to fitting left-censored models with R

Author

Published

April 21, 2025

Doi

In environmental epidemiology – and environmental studies more generally – we often aim to detect trace levels of pollutants, metabolites, or biomarkers. This can be done with for example liquid/gas chromatography mass spectrometry instruments, be it in a biospecimen like blood and urine or in environmental media such as soil, water, and air. And even though these instruments are very sensitive these days, they will sometimes hit their detection limits. This blog post will briefly go over how to handle such data statistically speaking, and what options are available in R to fit these models.

Censoring

Every epidemiologist is familiar with the standard survival analysis example that describes right-censored observations. In this example, some participants are lost to follow-up or do not experience the event during the study. As a result, for these participants we only know that a possible event occurred somewhere after our observation period. These observations are then said to be right-censored at the end of this window.

With the aforementioned instruments, the situation is flipped: we only know a value is less than a certain value. In these cases, our instrument can’t discern the analyte from background noise below a certain point. In other words, measurements on such an instrument were limited to a minimum measurement of \(l\) so any measurement smaller than \(l\) was recorded as \(l\). We refer to this minimum measurement as the limit of detection or limit of quantification and the data is said to be left-censored at this limit¹. Measurements smaller than this simply cannot be measured with this instrument.²

Writing this in a more formal way, we can specify a latent regression model with an outcome that is either observed or unobserved:

\[ y_i^*= \begin{cases}y_i & \text { if } y_i>l \\ l & \text { if } y_i \leq l \end{cases} \] where \(y_i\) represents the ‘true’, latent values, \(y_i^*\) are values we observe in reality, and \(l\) is the limit of detection³.

In most environmental studies (and the classical tobit model from economics)⁴ we then assume a normally distributed error term for the true values \(y_i\) so we write

\[ y_i \sim \mathrm{~N}\left(a+b x_i, \sigma^2\right), \text { for } i=1, \ldots, n \] We can then estimate the parameters of the underlying distribution using the observed data \(y_i^*\) by including a separate term for the censoring mechanism in the likelihood:

\[ \operatorname{Pr}\left(y_i \leq l\right)=\int_{-\infty}^{l} \mathrm{N}\left(y_i \mid a+b x_i, \sigma^2\right)= \Phi\left(\frac{l - (a+b x_i)}{\sigma}\right) \] such that the two terms of the likelihood become \[ y_i^*= \begin{cases}\mathrm{N}\left(y_i \mid a+b x_i, \sigma^2\right) & \text { if } y_i>l \\ \Phi\left(\frac{l - (a+b x_i)}{\sigma}\right) & \text { if } y_i \leq l \end{cases} \] with the likelihood for the uncensored data points just being the normal probability density function.

Visually, the densities of \(y\) and \(y^*\) looks as follows:

Code

n <- 100000
prop_censored <- 0.3
y <- rnorm(n)
censor_point <- quantile(y, prop_censored)
censored <- y <= censor_point
y_obs <- ifelse(censored, censor_point, y)
y_dens <- density(y, n = n)
y_obs_dens <- density(y_obs, n = n)

library(ggplot2)
df_y_dens <- tibble::tibble(dens_x = y_dens$x,
                            dens_y = y_dens$y,
                            censor_point = censor_point,
                            type = "true")

df_y_dens_obs <- tibble::tibble(dens_x = y_obs_dens$x,
                                dens_y = y_obs_dens$y,
                                censor_point = censor_point,
                                type = "observed")

theme_set(theme_minimal() +
            theme(axis.line = element_line(colour = "black", linewidth = rel(1)),
                  panel.border = element_blank()))

ggplot(mapping = aes(x = dens_x, y = dens_y)) +
  geom_line(data = df_y_dens, linewidth = 0.9,
            aes(color = "True values"), linetype = "dotted") +
  geom_line(data = subset(df_y_dens_obs, dens_x > censor_point), 
            linewidth = 0.9,
            aes(color = "Observed values")) +
  geom_vline(aes(xintercept = censor_point, color = "Censor point"), 
             linetype = "dashed", linewidth = 1) +
  geom_area(data = subset(df_y_dens, dens_x <= censor_point),
            fill = "#822D2F",
            aes(color = "Censored values")) +  
  labs(x = "Density") +
  scale_color_manual(name = "",
                     values = c("True values" = "black", 
                                "Observed values" = "#375A6D", 
                                "Censor point" = "black", 
                                "Censored values" = "#822D2F")) +
  theme_minimal() +
  theme(axis.title = element_blank(),  
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        legend.position = c(0.1, 0.8))

so informally speaking, by virtue of the data being censored and not truncated we know the height of the probability mass and when we combine this information with the information from the uncensored observations, and the assumption we made on its distributional form we can ‘spread out’ the probability mass across the censored region and estimate the parameters of the latent/true data structure.

If we additionally simulate an effect of \(x\) on \(y\) and plot \(x\) vs \(y\) we can easily see that ignoring the censoring mechanism (or substituting by a constant value below the limit of detection as is often done – not shown) biases the estimate of the slope.

n <- 2000
prop_censored <- 0.3
b_1 <- 0.1
sd <- 0.5

x <- rnorm(n)
y <-  rnorm(n, x * b_1, sd)
censor_point <- quantile(y, prop_censored)
censored <- y <= censor_point
y_obs <- ifelse(censored, censor_point, y)

df <- tibble::tibble(y, y_obs, x, censor_point)

Code

df |> 
  ggplot() + 
  geom_point(aes(x = x, y = y_obs), color = '#375A6D') +
  geom_point(data = subset(df, y <= censor_point),
             aes(x = x, y = y), color = '#822D2F') +
  geom_smooth(aes(x = x, y = y_obs, color = "Observed values fit"), 
              method = "lm", linetype = "solid") +
  geom_smooth(aes(x = x, y = y, color = "True values fit"), 
              method = 'lm', linetype = "dotted") +
  scale_color_manual(name = "",
                     values = c("True values fit" = "dotted", "Observed values fit" = "solid")) +
  scale_color_manual(name = "",
                     values = c("True values fit" = "black", "Observed values fit" = "black")) +
  labs(x = expression(x),
       y = expression(y)) +
  theme(legend.position = c(0.15, 0.9))

Fitting a left-censored model using R

We can fit a model in R that accounts for the censoring mechanism by writing down the likelihood and passing it to optim to find the maximum:⁵

log_lik <- function(parameter_vector, x, y, l) {
  a <- parameter_vector[1]
  b <- parameter_vector[2]
  sigma <- parameter_vector[3]
  
  linear_predictor <- a + b*x
  
  # you could also do this more explicitly with an indicator variable and a multiplication 
  ll_vec <- ifelse(y > l, 
                   dnorm(y, linear_predictor, sigma, log = TRUE),
                   pnorm((l - linear_predictor)/sigma, log = TRUE))
  
  return(sum(ll_vec))
}

# initialize with a fit from the observed data, inits <- runif(3) will also work for simple models
initmod <- lm(y_obs ~ 1 + x, data = df)
print(c(coef(initmod), sigma = summary(initmod)$sigma))

(Intercept)           x       sigma 
 0.09927942  0.06638911  0.38652839

init <- c(coef(initmod), sigma = log(summary(initmod)$sigma))

mle <- optim(init, log_lik, lower = c(-Inf,-Inf, 1.e-5),
             method = "L-BFGS-B", control = list(fnscale = -1), 
             x = df[['x']], y = df[['y_obs']], l = censor_point)
mle$par

 (Intercept)            x        sigma 
0.0005055924 0.0974616424 0.5137657833

We can see that the procedure managed to recover the true values for \(\sigma\) (= 0.5) and \(\beta_1\) (= 0.1).

Of course there are also R packages that can do such a procedure for us. Both frequentist and Bayesian. Below an overview:

{survival}

In {survival} you can specify left-censored models just as you would specify your right-censored survival models using the Surv object. The only difference being that we assume a regular Gaussian distribution here for the outcome. Moreover we need to specify which data is left-censored and at what point. This censor_point can also be a vector and thereby allow your observations to be censored at different levels/limits.

m <- survival::survreg(survival::Surv(y_obs,
                                      y_obs > censor_point,
                                      type = "left") ~ 1 + x,
                       data = df,
                       dist = "gaussian")
summary(m)


Call:
survival::survreg(formula = survival::Surv(y_obs, y_obs > censor_point, 
    type = "left") ~ 1 + x, data = df, dist = "gaussian")
                Value Std. Error      z      p
(Intercept)  0.000506   0.012272   0.04   0.97
x            0.097462   0.011836   8.23 <2e-16
Log(scale)  -0.665992   0.020230 -32.92 <2e-16

Scale= 0.514 

Gaussian distribution
Loglik(model)= -1587.6   Loglik(intercept only)= -1621.4
    Chisq= 67.57 on 1 degrees of freedom, p= 2e-16 
Number of Newton-Raphson Iterations: 3 
n= 2000

You can get the same result by specifying the left-censoring as a special case of interval censoring:

m <- survival::survreg(survival::Surv(time = ifelse(y_obs == censor_point, -999, y_obs), 
                                      time2 = censor_point,
                                      event = ifelse(y_obs == censor_point, 3, 1), 
                                      type = "interval") ~ 1 + x, 
                       data = df,
                       dist = "gaussian")
summary(m)


Call:
survival::survreg(formula = survival::Surv(time = ifelse(y_obs == 
    censor_point, -999, y_obs), time2 = censor_point, event = ifelse(y_obs == 
    censor_point, 3, 1), type = "interval") ~ 1 + x, data = df, 
    dist = "gaussian")
                Value Std. Error      z      p
(Intercept)  0.000506   0.012272   0.04   0.97
x            0.097462   0.011836   8.23 <2e-16
Log(scale)  -0.665992   0.020230 -32.92 <2e-16

Scale= 0.514 

Gaussian distribution
Loglik(model)= -1587.6   Loglik(intercept only)= -1621.4
    Chisq= 67.57 on 1 degrees of freedom, p= 2e-16 
Number of Newton-Raphson Iterations: 6 
n= 2000

There’s also a lognormal distribution available in {survival}.

{censReg}

{censReg} does not support varying censoring points, as the left argument only allows a constant, whereas {survival} allows this by specifying a censor_point vector.

m <- censReg::censReg(y_obs ~ 1 + x, 
                      data = df, 
                      method = "BHHH", 
                      left = min(df$y_obs), right = Inf)
summary(m)


Call:
censReg::censReg(formula = y_obs ~ 1 + x, left = min(df$y_obs), 
    right = Inf, data = df, method = "BHHH")

Observations:
         Total  Left-censored     Uncensored Right-censored 
          2000            600           1400              0 

Coefficients:
              Estimate Std. error t value  Pr(> t)    
(Intercept)  0.0005067  0.0122142   0.041    0.967    
x            0.0974645  0.0119876   8.130 4.28e-16 ***
logSigma    -0.6659855  0.0209399 -31.805  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

BHHH maximisation, 17 iterations
Return code 8: successive function values within relative tolerance limit (reltol)
Log-likelihood: -1587.58 on 3 Df

Multilevel left-censored models

Sometimes your data is more complicated and you want to specify different levels of your data. For example: a random intercept that specifies what subject or batch an observation belongs to. Your R package options are more limited then, as the frequentist approach now needs to additionally integrate over the random effects in the definition of the marginal likelihood. The {survival} model does not support such models while {censReg} supports only a simple two-level random intercept model:

samples_p_batch <- 100
nr_batches <- n/samples_p_batch
sd_batch <- 2

re_batch <- rep(rnorm(nr_batches, 0, sd_batch), each = samples_p_batch)
y <-  rnorm(n, x * b_1 + re_batch, sd)

censor_point <- quantile(y, prop_censored)
censored <- y <= censor_point
y_obs <- ifelse(censored, censor_point, y)

df <- tibble::tibble(y, y_obs, x, censor_point, 
                     index = rep(1:samples_p_batch, times = nr_batches),
                     batch_id = rep(1:nr_batches, each = samples_p_batch))

{censReg}

To get such a two-level random intercept model to work in {censReg} we have to create a {plm} object first⁶

pData <- plm::pdata.frame(df, c( "batch_id", "index"))

m <- censReg::censReg(y_obs ~ 1 + x, 
                      data = pData, 
                      method = "BHHH", 
                      left = min(df$y_obs), right = Inf)
summary(m)


Call:
censReg::censReg(formula = y_obs ~ 1 + x, left = min(df$y_obs), 
    right = Inf, data = pData, method = "BHHH")

Observations:
         Total  Left-censored     Uncensored Right-censored 
          2000            600           1400              0 

Coefficients:
             Estimate Std. error t value  Pr(> t)    
(Intercept) -0.091705   0.004090 -22.423  < 2e-16 ***
x            0.083038   0.016855   4.927 8.37e-07 ***
logSigmaMu   0.454964   0.002405 189.172  < 2e-16 ***
logSigmaNu  -0.474995   0.007687 -61.794  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

BHHH maximisation, 150 iterations
Return code 4: Iteration limit exceeded (iterlim)
Log-likelihood: -1475.026 on 4 Df

{GLMMadaptive}

One other frequentist package available for your left-censored models is {GLMMadaptive}. In addition to your observed outcome, you need to specify a vector with an indicator variable that describes whether the observation was censored:

m <- GLMMadaptive::mixed_model(cbind(y_obs, censored) ~ 1 + x, 
                               random = ~ 1|batch_id, 
                               data = df,
                               family = GLMMadaptive::censored.normal())
summary(m)


Call:
GLMMadaptive::mixed_model(fixed = cbind(y_obs, censored) ~ 1 + 
    x, random = ~1 | batch_id, data = df, family = GLMMadaptive::censored.normal())

Data Descriptives:
Number of Observations: 2000
Number of Groups: 20 

Model:
 family: censored normal
 link: identity 

Fit statistics:
   log.Lik      AIC      BIC
 -1196.614 2401.229 2405.212

Random effects covariance matrix:
             StdDev
(Intercept) 1.89308

Fixed effects:
            Estimate Std.Err z-value p-value
(Intercept)  -0.2812  0.4201 -0.6694 0.50326
x             0.0742  0.0126  5.8995 < 1e-04

log(residual std. dev.):
  Estimate Std.Err
   -0.6983  0.0192

Integration:
method: adaptive Gauss-Hermite quadrature rule
quadrature points: 11

Optimization:
method: hybrid EM and quasi-Newton
converged: TRUE

We can also use random slopes in GLMMadaptive. But only a single grouping factor (i.e., no nested (i.e. 2+ levels) or crossed random effects designs) is supported at the moment. {GLMMadaptive} is relatively fast. Unfortunately, when I had to fit more than 20 of these models for projects at least one of those models had convergence issues/errors. But perhaps someone with more feel for optimizer parameters is more successful in fitting these models at scale.

And contrary to {censReg} you can extract the random effects:

GLMMadaptive::ranef(m) |> head()

  (Intercept)
1   0.3581646
2   1.2568000
3   0.7057757
4  -2.1809270
5  -0.6329443
6   3.2980978

{brms} / {STAN}

The addition of random intercepts and slopes is less challenging in a Bayesian framework, because it avoids the cumbersome integration. Similarly to {GLMMadaptive} we specify an additional indicator variable for our {brms} model, but this time left-censoring is indicated by \(-1\) (uncensored is \(0\) in brms and \(1\) indicates right-censoring):

m <- brms::brm(y_obs | cens(censored) ~ 1 + x + (1|batch_id), 
               data = df |> dplyr::mutate(censored = censored * -1), 
               seed = 2025, refresh = 1000,
               backend = "cmdstanr", cores = 4)

Running MCMC with 4 parallel chains...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 2 finished in 27.7 seconds.
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 1 finished in 28.6 seconds.
Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 4 finished in 30.8 seconds.
Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 3 finished in 31.8 seconds.

All 4 chains finished successfully.
Mean chain execution time: 29.7 seconds.
Total execution time: 32.0 seconds.

summary(m)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y_obs | cens(censored) ~ 1 + x + (1 | batch_id) 
   Data: dplyr::mutate(df, censored = censored * -1) (Number of observations: 2000) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~batch_id (Number of levels: 20) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     2.02      0.33     1.48     2.78 1.01      513      776

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -0.27      0.43    -1.19     0.54 1.01      408      511
x             0.07      0.01     0.05     0.10 1.00     1719     2001

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.50      0.01     0.48     0.52 1.00     1585     1527

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

If we print the actual STAN code we see familiar elements from our earlier likelihood:

brms::stancode(m)

// generated with brms 2.22.0
functions {
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  // censoring indicator: 0 = event, 1 = right, -1 = left, 2 = interval censored
  array[N] int<lower=-1,upper=2> cens;
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int<lower=1> Kc;  // number of population-level effects after centering
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  array[N] int<lower=1> J_1;  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  // indices of censored data
  int Nevent = 0;
  int Nrcens = 0;
  int Nlcens = 0;
  array[N] int Jevent;
  array[N] int Jrcens;
  array[N] int Jlcens;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  // collect indices of censored data
  for (n in 1:N) {
    if (cens[n] == 0) {
      Nevent += 1;
      Jevent[Nevent] = n;
    } else if (cens[n] == 1) {
      Nrcens += 1;
      Jrcens[Nrcens] = n;
    } else if (cens[n] == -1) {
      Nlcens += 1;
      Jlcens[Nlcens] = n;
    }
  }
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b;  // regression coefficients
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> sigma;  // dispersion parameter
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  array[M_1] vector[N_1] z_1;  // standardized group-level effects
}
transformed parameters {
  vector[N_1] r_1_1;  // actual group-level effects
  real lprior = 0;  // prior contributions to the log posterior
  r_1_1 = (sd_1[1] * (z_1[1]));
  lprior += student_t_lpdf(Intercept | 3, 0.1, 2.5);
  lprior += student_t_lpdf(sigma | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  lprior += student_t_lpdf(sd_1 | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = rep_vector(0.0, N);
    mu += Intercept + Xc * b;
    for (n in 1:N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
    }
    // vectorized log-likelihood contributions of censored data
    target += normal_lpdf(Y[Jevent[1:Nevent]] | mu[Jevent[1:Nevent]], sigma);
    target += normal_lccdf(Y[Jrcens[1:Nrcens]] | mu[Jrcens[1:Nrcens]], sigma);
    target += normal_lcdf(Y[Jlcens[1:Nlcens]] | mu[Jlcens[1:Nlcens]], sigma);
  }
  // priors including constants
  target += lprior;
  target += std_normal_lpdf(z_1[1]);
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
}

With {brms} the possibilities are almost endless⁷ and fortunately its speed has also noticeably improved the last few years. My censored models usually converge out of the box, even with multiple random intercept terms. And with some reasonably strong priors I’ve also had success with high proportions of censored data (say >80%) and multiple interaction terms.

{MCMCglmm}

{MCMCglmm} also supports left-censored multilevel (>two-level?) models. The random effect notation is a bit different though. The intercept is varying by subject here (random intercept), random = ~us(1+fixed_effect):cluster gives a random intercept/slope model with estimated covariance, and random = ~idh(1+fixed_effect):cluster is the same but with the covariance set to 0. We can specify a left-censored outcome by creating two variables. If left-censored, y_obs_min is -Inf and y_obs_max takes on the value of the censoring point. In the other, non censored cases y_obs_min and y_obs_max are identical and take on the observed value:

library(MCMCglmm)

m <- MCMCglmm::MCMCglmm(cbind(y_obs_min, y_obs_max) ~ 1 + x , 
                        random = ~ batch_id, 
                        family = "cengaussian", 
                        data = df |> 
                          dplyr::mutate(y_obs_min = ifelse(censored == 1, -Inf, y_obs),
                                        y_obs_max = ifelse(censored == 1, censor_point, y_obs)) |>
                          data.frame(),
                        nitt = 20000, thin = 1, burnin = 10000,
                        verbose = FALSE)

summary(m)


 Iterations = 10001:20000
 Thinning interval  = 1
 Sample size  = 10000 

 DIC: 2077.777 

 G-structure:  ~batch_id

         post.mean l-95% CI u-95% CI eff.samp
batch_id     4.123    1.728    7.005     3863

 R-structure:  ~units

      post.mean l-95% CI u-95% CI eff.samp
units    0.2479   0.2298   0.2675     2027

 Location effects: cbind(y_obs_min, y_obs_max) ~ 1 + x 

            post.mean l-95% CI u-95% CI eff.samp  pMCMC    
(Intercept)  -0.27949 -1.11097  0.68806     9673  0.518    
x             0.07432  0.04962  0.09862     4620 <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The G structure refers to the random effect structure, while the R structure is the residual structure.

I think by default you do not get easy interpretable scale parameters, but you can obtain an estimate by taking the square root of the posterior distribution of the (co)variance matrices:

summary(sqrt(m$VCV))[[1]]

              Mean          SD     Naive SE Time-series SE
batch_id 2.0001114 0.350291352 3.502914e-03   0.0057400762
units    0.4978472 0.009719887 9.719887e-05   0.0002157377

{INLA}

Lastly, I want to mention the {INLA} package. It seems like it’s neither purely frequentist nor fully Bayesian but instead approximates the posterior marginals. It’s mostly used in spatial statistics so perhaps this makes it the ideal candidate for censored concentration measurements with a spatial element, such as data from a measurement campaign. Its interface is slightly unusual compared to the R regression packages I’m familiar with, but it is very, very fast! Many censored likelihood families are supported, but unfortunately the Gaussian family is not one of them:

names(INLA::inla.models()$likelihood) |> 
    purrr::keep(~stringr::str_detect(.x, "surv"))

[1] "gammasurv"        "gammajwsurv"      "qloglogisticsurv" "lognormalsurv"   
[5] "exponentialsurv"  "weibullsurv"      "loglogisticsurv"  "fmrisurv"        
[9] "gompertzsurv"

A censored gamma and lognormal family⁸ are present though and these are also useful in environmental studies.

You specify censoring using a inla.surv where left-censoring is coded as \(2\) and uncensored observations get a \(1\) indicator:

library(INLA)

# simulate lognormal data
samples_p_batch <- 100
nr_batches <- n/samples_p_batch
sd_batch <- 2

re_batch <- rep(rnorm(nr_batches, 0, sd_batch), each = samples_p_batch)
y <-  rlnorm(n, x * b_1 + re_batch, sd)

censor_point <- quantile(y, prop_censored)
censored <- y <= censor_point
y_obs <- ifelse(censored, censor_point, y)

df <- tibble::tibble(y, y_obs, x, censor_point, 
                     index = rep(1:samples_p_batch, times = nr_batches),
                     batch_id = rep(1:nr_batches, each = samples_p_batch)) |> 
  dplyr::mutate(censored = ifelse(censored == 1, 2, 1))

surv_obj <- inla.surv(df$y_obs, df$censored)
m <- inla(surv_obj ~ 1 + x + f(batch_id, model = "iid"), 
          data = df, 
          family = "lognormal.surv", control.compute = list(config = TRUE))

summary(m)


Call:
   c("inla.core(formula = formula, family = family, contrasts = contrasts, 
   ", " data = data, quantiles = quantiles, E = E, offset = offset, ", " 
   scale = scale, weights = weights, Ntrials = Ntrials, strata = strata, 
   ", " lp.scale = lp.scale, link.covariates = link.covariates, verbose = 
   verbose, ", " lincomb = lincomb, selection = selection, control.compute 
   = control.compute, ", " control.predictor = control.predictor, 
   control.family = control.family, ", " control.inla = control.inla, 
   control.fixed = control.fixed, ", " control.mode = control.mode, 
   control.expert = control.expert, ", " control.hazard = control.hazard, 
   control.lincomb = control.lincomb, ", " control.update = 
   control.update, control.lp.scale = control.lp.scale, ", " 
   control.pardiso = control.pardiso, only.hyperparam = only.hyperparam, 
   ", " inla.call = inla.call, inla.arg = inla.arg, num.threads = 
   num.threads, ", " keep = keep, working.directory = working.directory, 
   silent = silent, ", " inla.mode = inla.mode, safe = FALSE, debug = 
   debug, .parent.frame = .parent.frame)" ) 
Time used:
    Pre = 0.419, Running = 0.403, Post = 0.111, Total = 0.933 
Fixed effects:
             mean    sd 0.025quant 0.5quant 0.975quant  mode kld
(Intercept) 0.374 0.493     -0.613    0.379      1.335 0.378   0
x           0.098 0.013      0.073    0.098      0.123 0.098   0

Random effects:
  Name    Model
    batch_id IID model

Model hyperparameters:
                                              mean    sd 0.025quant 0.5quant
Precision for the lognormalsurv observations 3.985 0.153      3.679    3.987
Precision for batch_id                       0.224 0.074      0.105    0.215
                                             0.975quant  mode
Precision for the lognormalsurv observations      4.280 4.001
Precision for batch_id                            0.392 0.198

Marginal log-Likelihood:  -3353.37 
 is computed 
Posterior summaries for the linear predictor and the fitted values are computed
(Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

The random effects estimates are expressed on the precision scale but you can do the following to obtain the variance of the random effects on a familiar scale:⁹

v1 <- m |> purrr::pluck("internal.marginals.hyperpar", 
                        "Log precision for batch_id")
v2 <- m |> purrr::pluck("internal.marginals.hyperpar", 
                        "Log precision for the lognormalsurv observations")

v1p <- inla.tmarginal(function(x) 1/exp(x), v1) 
v2p <- inla.tmarginal(function(x) 1/exp(x), v2)

r <- dplyr::bind_rows(
  inla.zmarginal(v1p) |> tibble::as_tibble() |> dplyr::mutate(variable = "batch_id"),
  inla.zmarginal(v2p) |> tibble::as_tibble() |> dplyr::mutate(variable = "residual"))

 r |> 
  dplyr::mutate(mean = sqrt(mean))

# A tibble: 2 × 8
   mean      sd quant0.025 quant0.25 quant0.5 quant0.75 quant0.975 variable
  <dbl>   <dbl>      <dbl>     <dbl>    <dbl>     <dbl>      <dbl> <chr>   
1 2.23  1.77         2.56      3.73     4.63      5.87       9.41  batch_id
2 0.501 0.00965      0.234     0.245    0.251     0.258      0.272 residual

You can also access the posterior through INLA::inla.posterior.sample.

In my experience, {INLA} needs – compared to {brms} – much stronger priors on random intercept terms in a censored (lognormal) model. I have yet to encounter a real-life scenario where it could reliably do with a noninformative prior. Perhaps this is simply a consequence of the lognormal family or the combination of the lognormal family and some parameter space definition difference I’m not aware of, but I often have had to specify priors like these

halfcauchy  = "expression:
  sigma = exp(-theta/2);
  gamma = 0.1;
  log_dens = log(2) - log(pi) - log(gamma);
  log_dens = log_dens - log(1 + (sigma / gamma)^2);
  log_dens = log_dens - log(2) - theta / 2;
  return(log_dens);
"
hcpriorlist = list(prec = list(prior = halfcauchy))

m <- inla(surv_obj ~ 1 + f(batch_id, model = "iid", hyper = hcpriorlist), 
          data = df, family = "lognormal.surv", 
          control.compute = list(config = TRUE))

for good frequentist coverage properties – with less strong priors there would be huge outliers where some simulations had an estimated mean a couple magnitudes higher than expected.

Other packages

There are other packages available to fit these kinds of models that I have not covered here: {nlmixr2}, {ARpLMEC}, {lme4cens}, {gamlss} (supposedly also multilevel?), and {lmec} but they are either minimally used or minimally validated¹⁰, or were somewhat clunky to use for me. There’s also talks to support censored regression models in {glmmTMB}. And there’s the {AER} package, but it seems like that is just a wrapper for the {survival} library with less functionality (i.e. no varying censoring points).

References

Gelman, A., and J. Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Analytical Methods for Social Research. Cambridge University Press. https://sites.stat.columbia.edu/gelman/arm/.

Gelman, Andrew et al. 2013. Bayesian Data Analysis. Third. Chapman & Hall/CRC Texts in Statistical Science Series. CRC. https://stat.columbia.edu/~gelman/book/.

Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2021. Regression and Other Stories. Cambridge University Press. https://avehtari.github.io/ROS-Examples/.

Helsel, D. R. 2005. Nondetects and Data Analysis: Statistics for Censored Environmental Data. Statistics in Practice. Wiley. https://books.google.nl/books?id=krwQAQAAIAAJ.

Helsel, Dennis R. 2006. “Fabricating Data: How Substituting Values for Nondetects Can Ruin Results, and What Can Be Done about It.” Chemosphere, Environmental Chemistry, 65 (11): 2434–39. doi:10.1016/j.chemosphere.2006.04.051.

Senn, Stephen, Nick Holford, and Hans Hockey. 2012. “The Ghosts of Departed Quantities: Approaches to Dealing with Observations Below the Limit of Quantitation.” Statistics in Medicine 31 (30). Wiley Online Library: 4280–95. https://doi.org/10.1002/sim.5515.

Footnotes

Not to be confused with left truncation. In both truncation and censoring no information on the value of the observation beyond the truncation/censoring point is available. However, with censoring we know the number of observations beyond this point contrary to truncation where we do not observe any of the dependent variable or covariates – the entry is simply missing from your data table. See for example Senn, Holford, and Hockey (2012) and Andrew Gelman et al. (2013), p.224 for a more in depth explanation on the statistical connections between truncation and censoring.↩︎
While I have not encountered it myself, I have read that (some) mass spectrometry instruments can also reach saturation. This will not be the topic of this post. I think, however, that saturation would introduce an additional right-censoring mechanism or some type of non-linear effect – though again, I’m not sure how this is handled in practice.↩︎
In this post we assume the limit of detection is known, but this need not always be the case. You can of course model this too as an unknown random variable. See for example Andrew Gelman et al. (2013), p.226 on how to handle this.↩︎
The classical tobit model from economics assumes a normal distribution for outcome variable with left-censoring at 0.↩︎
Via A. Gelman and Hill (2006) and Michael Clark his post.↩︎
Though there seem to be some questions on if {censReg} is doing it correctly (per the author of the {GLMMadaptive} package).↩︎
Unfortunately no censored predictors yet.↩︎
Not entirely sure if the random effects also follow a lognormal distribution here. My simulations (not shown) suggest a lognormal normal distribution so this would be different from log transforming the outcome and using our {brms} model.↩︎
See for example https://becarioprecario.bitbucket.io/inla-gitbook/ch-survival.html and https://www.paulamoraga.com/book-geospatial/sec-inla.html for more information on how to work with {INLA}.↩︎
We are probably overconfident in statistical software and do not evaluate them sufficiently before using them in our work. See this post from Alex Hayes for some thoughts on checking statistical software.↩︎

Citation

BibTeX citation:

@online{oosterwegel2025,
  author = {Oosterwegel, Max J.},
  title = {A Guide to Fitting Left-Censored Models with {R}},
  date = {2025-04-21},
  url = {https://maxoosterwegel.com/blog/left-censored-models-showcase/},
  doi = {placeholder},
  langid = {en}
}

For attribution, please cite this work as:

Oosterwegel, Max J. 2025. “A Guide to Fitting Left-Censored Models with R.” April 21. doi:placeholder.